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This is a review of the book “Why Only Us” by Berwick & Chomsky 
done from the point of view of computing. The main conclusions 
are: that the book uses the word ‘recursion’ with a meaning that 
does not correspond to the meaning used in mathematics, causing 
confusion, and that the theory presented in the book, that syntax 
can be reduced to Merge, does not translate sensibly to computing, 
because Merge by itself cannot even parse, and therefore it is not 
enough to explain the evolution of Merge to explain the evolution 
of language, as the book proposes. 
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§1 Introduction 

Ifl • Being myself an electric engineer, I am an outsider both to linguistics and to evo- 
lution, and then I am not optimally suited to criticize a book subtitled “Language and 
Evolution”. So, yes, this is yet another review of “Why Only Us” (WOU) by Berwick & 
Chomsky (2016) (B&C), but this time from a eccentric point of view, computing. 

If 2 • I have learned a lot reading WOU, but every time I met the word ‘recursion’ I felt 
troubled. This happened surely because my understanding of recursion is different from 
the one used in linguistics, and therefore I should firstly explain what I think about 
recursion. I am very sorry, but we will need some mathematics. 
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§2 Mathematics 

§2.1 Recursion 

11 • Chomsky (2007), page 20, is wrong when he writes: “If the lexicon is reduced to a 
single element, then unbounded Merge will easily yield arithmetic.” The offending word 
is ‘easily’. The definition of Merge, in the same page, is: 


Merge (A", Y) = {A, Y} . 

Chomsky (2006), page 184, states it more precisely: “The most restrictive case of Merge 
applies to a single object, forming a singleton set. Restriction to this case yields the 
successor function, from which the rest of the theory of natural numbers can be developed 
in familiar ways.” The theory of natural numbers is also known as arithmetic. 

12 • Let us call the single element of the lexicon 0. Then, 


Merge(0) = Merge(0, 0) = {0, 0} = {0} , 


which is indeed a singleton set. Reiterating we obtain: Merge(Merge(0)) = Merge({0}) = 
{{0}}, Merge (Merge (Merge (0))) = Merge({{0}}) = {{{0}}}, and so on. In the construc- 
tion of the natural numbers by Zermelo (1908): 0 = 0, 1 = {0}, 2 = {{0}}, 3 = {{{0}}}, 
and so on. Now, rewriting, Merge(0) = 1, Merge(l) = 2, Merge(2) = 3, Merge(n) = n + l, 
and then Merge is indeed the successor function. So far, so good. 

13 ■ But the successor function is not enough to develop the theory of natural numbers. 
This question was answered by Kleene (1952) in his exhaustive investigation of recursion, 
an investigation that can be traced back to Kleene (1943), and in even to Kleene (1935). 
Kleene (1952) is the classical text book on recursion, and surely the most complete and 
instructive text on the subject, so we will use it here. 

14 • The successor function is only Schema (I), the simplest of the six schemata (I)-(VI) 
that are needed to define any recursive function of natural numbers, see pages 219 and 
279. The six schemata, where x' is the successor of x, and yyR(y) is the least (natural 
number) y such that predicate R(y) holds, are: 


(I) successor function: 


<p(x) = x' 


(II) constant functions: 


<fi(x i,. ■ .,x n ) = q 


(III) identity functions: 


<p(xi, ...,X n )=Xi 


(IV) substitution: 


■■■,X n )= V’OaO&l, ■ ■ ■ , Xn), ■ ■ ■ , Xm(xi, • ■ ■ , X n )) 


(V) primitive recursion: 

<p{ 0, X 2 ,..., Xn) = -0(^2, ■ ■ ■ , x n ) 
vW, X2,..., Xn) = x(y, <p(y, X2,...,X n ),X2,..., X n ) 
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(VI) general recursion: 

ip(xi, ...,x„)= fj.y[x(xi, ...,x n ,y) = 0 ] 

15 • Using primitive recursions the calculation goes down from y' to y, finishing if it 
reaches 0, and then using Schemata (I)-(V) we obtain total functions, and not all of 
them, but adding the conditional loop of general recursion the calculation can go up 
without bound, and then using Schemata (I)-(VI) we obtain all recursive functions, total 
and partial, as stated by Corollary to Theorem XIX in page 331 of Kleene (1952). 

16 • The conclusion is that in recursion the successor function is the simplest of the six 
schemata, and primitive and general recursion play the leading roles. The six schemata 
are needed to achieve full recursion, and the successor function is needed because it is the 
step that applied iteratively without bound can traverse through each and every member 
of the infinite enumerable set of natural numbers, so it is a necessary step to effectively 
enumerate the natural numbers. 

§2.2 Lambda-calculus 

11 • As the successor function is a restriction of Merge, we could think that the lack of 
power of the successor function is caused by the restriction, but that Merge is much 
more powerful. We will see that that is not the case investigating another formalism: 
lambda-calculus. Lambda-calculus is mathematically equivalent to recursion, as proved 
by Kleene (1936). The classical text books on lambda-calculus are Curry & Feys (1958) 
and Barendregt (1985). 

12 ■ The syntax of lambda-calculus can be defined inductively this way: 

x E V V C A 
x e V =>■ x e V 
x E V, M E A => {Xx M) E A 
M,N eA=> (MN) eA 

This means that in lambda-calculus there are only: an infinity of variables built by 
priming, which belong to the subset V ; lambda abstractions, that is, function definitions, 
{Xx M)] and function applications, {M N ). These last two are ordered pairs that grow 
into binary trees by induction. 

13 • Applications of Merge also grow into binary trees by induction, with the only differ- 
ence that in lambda-calculus order matters. We will call the version of Merge used in 
lambda-calculus cons, a name taken from Lisp, see McCarthy (1960). So cons takes two 
objects and returns the ordered pair containing the two objects, while Merge takes two 
objects and returns the two-element set containing the two objects. Order is the only 
difference between Merge and cons. 

cons (A", Y) = {XY) 

14 • In order to allay any doubt, we will define a version of lambda-calculus that uses 
Merge instead of cons, and then two-element sets instead of ordered pairs. 

x E V V C A 
xEV^x' EV 
x E V, M E A => {Xx, M} E A 
M,N E A => {ipM, N} E A 
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We have needed an additional terminal, 93 , to tag the function in function applications, 
so the phi-tagged element is the function, and the non-tagged element is the argument 
to apply. We can call it lambda-phi-calculus, and it is, except for its syntax, identical to 
lambda-calculus. 

If 5 ■ Iu lambda-calculus, cons is the operation that builds composed data structures, just 
as Merge is in lambda-phi-calculus, and in language. But we have only presented the 
syntax of lambda-calculus, that is, the enumeration of its well-formed formulas, also 
known as lambda-expressions. And lambda-calculus is a calculus because there is a way 
to transform a lambda-expression into another lambda-expression, a way called beta- 
reduction by Curry & Feys (1958). So beta-reduction is to lambda-calculus, as the set of 
substitution, primitive and general recursion is to recursion. In other words, in lambda- 
calculus, the role of substitution and recursions is played by beta-reduction. Following 
the analogy, the set of the natural numbers is infinite enumerable and it can be traversed 
using the successor function, the set of lambda-expressions is infinite enumerable and 
it can be traversed using the cons operation, and the set of lambda-phi-expressions is 
infinite enumerable and it can be traversed using the Merge operation. 

§2.3 Computing 

11 • The only form of recursion mentioned in WOU is computing, and then, though it is 
widely known, it demands an introduction, at least a short one. The classic reference of 
computing is its founding paper by Turing (1936), where he presents his a- machine, now 
deservedly known as Turing machine, and his universal computing machine, now known 
as universal Turing machine. A universal Turing machine is a Turing machine that takes 
the description M of any Turing machine M. and any data D as input data, and returns 
what the Turing machine A4 returns when it takes data D as input. In other words, a 
universal Turing machine is a Turing machine that can be programmed to emulate any 
Turing machine. 

12 • Computing is mathematically equivalent to recursion, and to lambda-calculus, as 
proved by Turing (1937), who uses some results already proved by Kleene (1936). The 
equivalence means that for each recursive function, which is any obtained using the six 
schemata, there is a Turing machine that performs the same calculation, and the converse, 
and similarly for lambda-defined functions. 

13 • All forms of recursion deal with infinite enumerable sets, and then they require some 
form of infinity, but in computing it is possible to separate cleanly the computing capacity 
from the infinities. Thus, a Turing machine has not time limitations, and it is composed 
by a finite state computing module attached to an infinite tape, which is just external 
memory. This way computing abstracts away the limitations of a device in speed and 
memory from its computing capacity. A universal Turing machine is a Turing machine, 
and therefore it is also a finite state computing module attached to an infinite tape. We 
say that the computing capacity of the finite state computing module of a universal Turing 
machine is Turing complete. Now, it is important to realize i) that it is possible to build 
physically a Turing complete device, and the CPU of a full programmable computer is an 
example, and ii) that a Turing complete device has the computing power of a universal 
Turing machine, though it can meet some time and external memory limitations that the 
universal Turing machine will not meet. 
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14 • At this point, I can already explain how I understand recursion and equivalent for- 
malisms, as lambda-calculus and computing. In any of them there is a base set that is 
infinite enumerable, and the key is that, assuming the Church- Turing thesis, a specified 
finite set of operations is enough to perform any effective calculation on the members 
of the base set. The base set is the set of natural numbers in recursion, the set of 
lambda-expressions in lambda-calculus, and the set of strings of symbols in computing. 
Of these three, only lambda-expressions are hierarchical, because both natural numbers 
and strings are linear. 

§3 Review 

11 • Now I ask you to read WOU as if you were me, that is, using the concepts seen 
in §2. In particular, I beg you please to be specially aware of the word ‘recursion’, and 
remember its meaning, as explained in §2, every time it is mentioned. 

§3.1 Uncontroversial, page 9 

11 • After stating that language is basically hierarchical and unbounded, and not linear, 
B&C write in page 9 (emphasis in the original): “Not only do hierarchical properties rule 
the roost in human syntax, they have no real upper bound, [...]. If one subscribes to the 
Church- Turing thesis along with the assumption that the brain is finite, then there is no 
way out: we require some notion of recursion to adequately describe such phenomena. 
So much is uncontroversial.” 

12 • I subscribe to the Church- Turing thesis, to the finiteness of the brain, and to the 
hierarchical and unbounded nature of syntax, and yet I had to write my first question 
mark near the word ‘ require ’ in my new and pristine copy of WOU. 

13 • To me, recursion is required to allow any possible effective calculation on an infinite 
enumerable set, be it hierarchical or linear, but I do not think they were meaning that, 
because they were insisting in the hierarchical and non-linear nature of language. On the 
other hand, I was only in page 9. 

§3.2 Iterated without bound, page 70 

11 ■ In page 70, B&C define “the simplest mode of recursive generation: an operation 
that takes two objects already constructed, call them X and Y, and forms from them 
a new object that consists of the two unchanged, hence simply the set with X and Y 
as members. We call this optimal operation Merge. Provided with conceptual atoms of 
the lexicon, the operation Merge, iterated without bound, yields an infinity of digital, 
hierarchically structured expressions.” This deserves two notes. 

12 • First: Merge is a simple operation, I concede, but it is not perhaps the simplest 
composing operation, or at least some argument should be provided to convince me. For 
example, it seems that for McCarthy (1960) cons is simpler than Merge, given that he 
implemented cons in LlSP, but not Merge. 

13 • Second: Syntax generation needs, in addition to Merge, an iterating operation to 
yield an infinity of expressions. Merge by itself does not iterate. As it is explained in 
the quoted text, Merge simply takes two objects and returns the two-element set that 
contains the two objects. So Merge is just a step. If Merge, or its result, needs some kind 
of iteration, then some other operation is needed. It is not Merge that yields infinities, 
it is iteration without bound. 


www.ramoncasares.com 20170925 


MisnotR 6 


§3.3 Optimally, page 71 

11 ■ Defending the Strong Minimalist Thesis, B&C write in page 71: “Optimally, recursion 
can be reduced to Merge.” My understanding of recursion is not compatible with that 
sentence, see §2. Or they are using ‘recursion’ to mean something related to hierarchical 
that I cannot precise, or they are ignoring, at least, an iterating operation that can 
be applied to Merge in order to generate sentences that are longer than two words, as 
explained in the second note of §3.2. 

12 • Recursion cannot be reduced to Merge, because Merge by itself cannot calculate every 
recursive function, and in particular it cannot parse. To calculate every recursive function, 
at least another operation playing the role that beta-reduction plays in lambda-calculus, 
see §2.2, is needed. That operation could be Unification, proposed by Jackendoff (2011). 
That Merge by itself cannot even parse is shown for example by Ginsburg (2016) 1 , who 
needs nine operations and rules complementing Merge to parse. 

§3.4 Protolanguage, page 72 

11 • I disagree twice with B&C in page 72. 

12 ■ B&C argue that there were no protolanguage, because or i) there is Merge, and then 
sentences are not bounded, so there is full language, or ii) there is not Merge, and then 
there are not sentences, so there is no language. I see it otherwise: i) if there is Merge, and 
no iteration but only Merge, then all sentences will be two-word sentences, as Progovac 
(2016) suggests, and ii) without Merge a protolanguage without syntax, or with one-word 
sentences, is possible, as Bickerton (2014) suggests. And these two possibilities are not 
mutually exclusive. 

13 • The second disagreement in page 72 is not important, but let us see it because it is 
repeated twice. B&C write: “Given two syntactic objects A", Y , Merge can construct a 
larger expression in only two logically possible ways: either X and Y are disjoint; or else 
one is a part of the other.” Why that word ‘logically’? In set theory it is also possible 
that sets X and Y have some elements in common, and some other elements that the 
other set has not, for example X = {two, words} and Y = (many, words}. So, perhaps 
‘syntactically’ there are only two ways, though this is not explained, but as it is, the 
statement is false. This ‘logical’ error is repeated in pages 99 and 128. 

§3.5 All computers implement Merge, page 98 

11 • B&C write in page 98: “Every computational system has embedded within it some- 
where an operation that applies to two objects X and Y already formed, and constructs 
from them a new object Z. Call this operation Merge.” This is plainly false. 

12 • A Turing machine, which is the prototype of computational system, does not do any 
Merge. The only operation of the Turing machine is an ‘if’ operation: if the current 
state is qi, and the read symbol is s r (which can be blank), then transition to state q n 
(which can be equal to qi or not), write symbol s w (which can be equal to s r or not, even 
blank), and move to the left, or to the right, or halt. Two other counterexamples are a 
NOT gate and a binary on-off switch; though simple, they do computations, so they are 
computational systems, and they do not Merge anything. 

13 • It is false and against B&C because, assuming that every brain is a computational 
system, this would imply that every brainy animal would implement Merge, contradicting 

1 Kindly pointed to me by Berwick. 
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that “Merge is one key evolutionary innovation for human language” (page 112). Further- 
more, if recursion could be reduced to Merge, as affirmed by B&C (page 71, see §3.3), this 
would imply that all brainy species implement recursion, against the thesis that recursion 
is unique to our species, a proposal by Hauser, Chomsky, and Fitch (2002). 

§3.6 Merge generates infinities, page 112 

11 • In page 112 there is a summary, so B&C are repeating some ideas about Merge and 
recursion that bother me. They redefine Merge again, and add: “Merge can then apply 
recursively to this new hierarchically structured syntactic object, yielding, for example, 
the guy read books. In this way, Merge builds an infinite array of hierarchically structured 
represent at ions . ” 

12 • Comparing this quote to that in page 70, see §3.2, it seems that now ‘recursively’ 
means ‘iteratively without bound’. Again, the fact that Merge itself cannot build infini- 
ties, nor do syntax parsing, is hidden behind imprecise wording. 

13 • Falsely secured behind that veil, WOU neither explains the evolution of the operations 
that iterate, nor explains the evolution of the rules that trigger and halt the iterations. 
And in the end, what makes WOU’s theory on the evolution of language lacking is 
precisely this lack of explanations of the evolution of the core computational mechanisms 
of recursion. Merge does not ‘easily’ yield recursion, and therefore the evolution of Merge 
does not explain the evolution of recursion; more explanations are needed. 

§3.7 Computational progression, page 131 

11 • Arguing against the idea of an evolutionary progression of computing capacities, B&C 
write in page 131: “It seems that insect navigation [. . . ] requires the ability to ‘read from’ 
and ‘write to’ simple tape-like memory cells. But if so, that’s all one needs for a Turing 
machine.” 

12 • Read from and write to external memory is needed for a Turing machine, but in 
addition to these interfaces an ‘if’ operation is needed to determine what to write given 
the symbol that was read, see the details in §3.5. 

13 • In addition, that every computational system can be modeled as a Turing machine 
does not imply that all are equal in computing power, the same way that one cannot 
reason that because a binary on-off switch is implemented with digital electronics, and a 
full programmable computer is also implemented with digital electronics, then both have 
the same computing power. 

14 • Only universal Turing machines can be programmed to execute any computable be- 
havior, and not every Turing machine is a universal Turing machine. There is then a 
very important distinction between a non-programmable basic calculator, with its fixed 
set of operations, and a full programmable computer, which is Turing complete because 
it has the computing capacity of a universal Turing machine. For example, any full pro- 
grammable computer, even one that has not embedded within it a Merge operation as 
a universal Turing machine, see §3.5, can be programmed to do Merge. On the other 
hand, most non-programmable computing systems cannot do Merge, as for example a 
NOT gate or a binary on-off switch, though you can implement a non-programmable 
computing system that can only do Merge. 

15 ■ Therefore, if, for example, the insect cannot be instructed (or programmed) to do 
arithmetic, as we can be, then it has not “climbed all the way up Nature’s ladder” (page 
132). For a positive view on computational progression, see my Casares (2017a). 
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§3.8 Any computation at all, page 134 

11 ■ The reader should be ready to understand what bothers me when I read the following 
sentence in page 134: “Consider a Turing machine, which after all ought to be able to 
carry out any computation at all.” This is misleading and confusing. 

12 • A specific Turing machine can carry out one specific computation, and one is not any. 
However, for any specific computation, there is a specific Turing machine that can carry 
it out. And a specific universal Turing machine can be programmed to carry out any 
computation at all, but not every Turing machine is a universal Turing machine. Part of 
the trick is to understand that the specific universal Turing machine is also carrying out 
one specific computation, that of emulating any specific Turing machine, which is given 
to the universal one as input data (the program). 

13 • This and the previous subsection show that an important computational distinction 
is ignored in WOU. I am referring to the distinction between Turing completeness, also 
known as universal computing, and unrestricted computing, or just computing, which is 
called type 0 by Chomsky (1959). 


§4 Discussion 

§4.1 Unfair 

11 • My review in section §3 only points to what sounds bad to me, giving a very negative 
impression of WOU. This is unfair. I have enjoyed reading WOU, and I agree with most 
of what is written in it. 

12 • This is a non-exhaust ive list of my agreements with what B&C have written in WOU: 
Universal Grammar is the object of language evolution, syntax is hierarchical, position in 
syntax hierarchy has meaning, syntax hierarchy is unbounded, randomness plays a role 
in evolution, selection cannot generate novelty, chimps are quite unlike us linguistically, 
language evolved for thought rather than for communication, there is only one language, 
our species is probably the first with a recursive language, language did not evolve in the 
last 60k years, and more. 

§4.2 Turing completeness 

11 ■ But there is one thing in WOU that I do not buy: that Merge is recursion. From the 
above, I guess that my understanding of recursion is not what is meant by recursion in 
WOU, and perhaps in linguistics generally. But, after reading WOU, I still think that 
using ‘recursion’ to mean ‘discrete and hierarchically unbounded’, or ‘iterated without 
bound’, is misleading. So I would suggest to do finer distinctions on the computing 
capacity of devices, and particularly I would urge anybody interested in the evolution 
of language to discriminate Turing completeness (universality) from general computing 
(type 0 of Chomsky 1959), reserving ‘recursion’ for ‘Turing completeness’. 

12 ■ Take for example the statement in page 126: “‘Counting’ languages are not natural 
languages.” That there are not natural counting languages seems to imply that Universal 
Grammar cannot count. But we can count. This would mean that the evolution of 
Universal Grammar could not explain why we can count. Therefore we would have to 
assume that there is some Universal Calculator that explains our counting, or arithmetic, 
capacity, and that it evolved separately from Universal Grammar. As no other species 
seems to count, this would ask for two very improbable and unique capabilities to evolve 
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separately. And note that arithmetic was the reason why recursion was investigated in 
the first place, see §2.1. 

13 • A solution to this counting problem is to assume that Universal Grammar can count, 
because it is recursive, but that during the critical period some restrictions are imposed 
in order to make language acquisition easier. I am meaning ‘recursive’ in the sense of §2, 
so to be clearer, I am saying that Universal Grammar is Turing complete; this is further 
elaborated in Casares (2017b). 

14 • Of course, this solution suggests that the evolution of language could not be exactly 
as presented in WOU; see my own ideas on the evolution of syntax in Casares (2017c). 
And this solution also explains the fact that we can learn artificial counting languages, 
asking us to distinguish first language acquisition from later languages learning. 

§4.3 Words 

11 • One aspect of the evolution of language that needs an explanation, but that B&C 
admit they cannot explain (page 90, and note 1 of Chapter 4 in page 173), is the evolution 
of the conceptual atoms of the lexicon, which we will call words. The thesis that Universal 
Grammar is Turing complete provides some hints for an explanation. 

12 • One of the more surprising mathematical properties of recursion is self reference. In 
any kind of recursion, be it proper recursion, lambda-calculus, computing, or else, an 
element of the base set can refer to any object of the recursive system. For example, 
Godel (1930) was able to assign a unique natural number to any arithmetic expression, 
or sequence of expressions, so he could write: “Thus we have a proposition before us 
which asserts its own unprovability” , where the proposition is more or less equivalent to 
the English sentence ‘this sentence is false’. 

13 ■ This is interesting by itself, but what is important here is that recursive self reference 
applied to language requires that any word could be a referent, that is, that any word 
could be referred to, and this transcends the word as a mere reference. Then, it seems 
to me that the transition from the word as a property of an object, and therefore always 
referring to the object, to the word as an object itself, and therefore autonomous and 
not necessarily referential, can be just a consequence of the “recursion-ization” of our 
brain. And this applies also to sentences and texts, that achieve also a full object status 
by which abstract definitions and unreal worlds are possible. 

14 • I am saying that the assignment of arbitrary sounds to external objects is a capability 
that dogs already have, as shown by Pavlov (1927), and that self reference is a property 
of any recursive system, and then, once Turing completeness is reached, self reference 
comes automatically for free. Therefore, when our Universal Grammar achieved Turing 
completeness, our language could at once refer not only to external objects but also to 
its own objects, thus extending our world enormously. Language exceeds reality. 

§4.4 Merge 

11 ■ B&C affirm that Merge is the simplest composing operation, but I think that cons is 
simpler than Merge. When order is prescribed by syntax, parsing is simpler and needs less 
memory, because then the parser always knows what to expect. On the other hand, I was 
convinced by B&C that order does not matter internally and that order is a requirement 
of the vocal interface. The question is then: Why did evolution choose Merge instead 
of cons? It could be just by chance, as B&C argue convincingly in WOU. Or perhaps 
Merge, being more flexible in generation, was selected. Or Merge was repurposed and its 
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original task demanded no order. This conld be the case if Merge was used by vision, 
where the input image is converted into a hierarchical 3-D model, according to Marr 
(1982), and also acknowledged in WOU page 136 though not listed in page 158. But this 
are just possibilities on which I cannot decide. 

12 • Let us recapitulate what we have seen of Merge. Merge is not recursion, because 
Merge by itself cannot calculate every recursive function, see §2, and there are versions 
of recursion that do not use Merge, as for example computing, see §3.5. 

13 ■ In addition, Merge by itself is of little use. If the two-element set created by Merge 
cannot be inspected nor transformed, then it is useless. Even adding substitution, Schema 
(IV) of recursion by which the result of a Merge can be applied to another Merge, see §2.1, 
the only use of Merge with substitution would be to create always bigger data structures, 
wasting memory. The conclusion is that Merge is useless, and even detrimental, except if 
it is complemented with other operations that can take advantage of the data structures 
generated by Merge, along with some rules that trigger and control its use. The compu- 
tational system presented by Ginsburg (2016), already mentioned in §3.3, which requires 
ten high level operations and rules (including Merge) to parse, is a good example that 
Merge by itself, without a carefully designed set of complementary operations, cannot 
parse. So Merge is just like a single part of a clockwork. 

14 ■ In summary, Merge by itself makes little computational sense, and then, if Merge 
were implemented alone by some mutation, it would be eventually discarded by natural 
selection. On the other hand, Merge can be used to implement recursion, as for example 
the version of lambda-calculus that we have called lambda-phi-calculus in §2.2. But, if 
our brains implement a hierarchical recursion that uses Merge, as B&C could propose, 
then they should explain the evolution of the whole machinery that implements that 
recursion in our brains. 

§5 Conclusion 

11 • Language evolution requires a multidisciplinary approach, because it is a complex 
subject touching different problems, and all of them have to be considered simultaneously 
in order to achieve satisfactory solutions. In an environment like this, it is mandatory 
to share a common vocabulary, and it would be advisable to share a common theory. 
Ideally, the theory should be simple, or otherwise it would be difficult to translate it to 
every discipline involved. But, as Einstein said: “Everything should be made as simple 
as possible, but no simpler.” 

12 ■ I have found WOU lacking in both: vocabulary and theory. WOU uses the word ‘re- 
cursion’ with a meaning that does not correspond to the meaning used in mathematics, 
which is the discipline where the word was born. That is unfortunate, because mathemat- 
ical recursion is a far reaching concept that has a very precise meaning, while recursion 
in WOU is a vague concept that blurs important computational distinctions damaging 
WOU’s theory of the evolution of language. 

13 • B&C’s theory in WOU starts with what they call the Basic Property of language, 
namely, that language has three parts: a syntax core and two interfaces to interact with 
two other systems, the conceptual and the sensorimotor systems (page 11). The core is 
a finite computational system that generates hierarchical and unbounded structures, and 
then it requires recursion (page 9). Assuming the Strong Minimalist Thesis, recursion can 
be reduced to Merge (page 71), and Merge is as simple as logically possible (page 111). 


www.ramoncasares.com 20170925 


MisnotR 11 


14 • After reducing syntax to Merge, B&C can explain the evolution of language. Merge 
is so simple that it could be the result of a minor mutation (page 70) causing a slight 
rewiring of the brain (page 79) that enhanced the conceptual system (page 101) and was 
luckily preserved by evolution (page 25). At this stage, Merge should be considered part 
of the conceptual system. Later, a more complex interface with the sensorimotor system 
was enabled, by evolution or rather by the plasticity of the brain, so that language could 
be used as a communication tool (page 108). Only then can we construe language as 
a third system that provides a new link between the conceptual and the sensorimotor 
systems. 

15 ■ Thus, the evolutionary theory of language in WOU depends exclusively on Merge, 
and consequently on Merge’s recursivity and computational power. But Merge is not 
recursion, and Merge cannot even parse, so syntax cannot be reduced to Merge. Merge 
by itself makes a very limited computational system, though Merge can be part of a 
Turing complete one, that is, part of a computational system that implements recursion 
in the mathematical sense. But then it is not enough to explain the evolution of Merge 
to explain the evolution of language, as WOU proposes. 

16 ■ In summary, the theory of language evolution proposed in WOU is too simple because 
the Strong Minimalist Thesis is too strong. The theory in WOU is so simple that it 
cannot be translated sensibly to computing, because it reduces syntax to Merge, and 
Merge by itself has not enough computational power to parse and even less to implement 
mathematical recursion. Therefore, in my opinion, WOU fails from the computational 
point of view. 

17 • Taking together the two issues that I see in WOU — confusing universal computing 
with (unrestricted) computing, and reducing recursion to Merge — we get a question that 
reveals WOU’s weakness. If insects already have enough computing capacity, as B&C 
believe (page 131), and language is just a minor mutation away, as B&C assume (page 
70), then Why did only we evolve language? Why Only Lis? 
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